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Metaheuristic algorithms have been widely used to solve 
structural optimization problems. Despite their powerful 
search capabilities, these algorithms often require a large 
number of fitness evaluations. Constructing a machine 
learning classifier to identify which individuals should be 
evaluated using the original fitness evaluation is a great 
solution to reduce the computational cost. However, there is 
still a lack of a thorough comparison between machine 
learning classifiers when integrating into the optimization 
process. This paper aims to evaluate the efficiencies of 
different classifiers in eliminating unnecessary fitness 
evaluations. For this purpose, the weight optimization of a 
double-layer grid structure comprising 200 members is used 
as a numerical experiment. Six machine learning classifiers 
selected for assessment in this study include Artificial Neural 
Network, Support Vector Machine, k-Nearest Neighbor, 
Decision Tree, Random Forest, and Adaptive Boosting. The 
comparison is made in terms of the optimal weight of the 
structure, the rejection rate as well as the computing time. 
Overall, it is found that the AdaBoost classifier achieves the 
best performance. 
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1. Introduction 


Structural optimization has attracted a great deal of interest from both academia and industry. 
Pioneering works of structural optimization were carried out by Maxwell in 1869 [1] and 
Mitchell in 1904 [2]. During the 1960s, Mathematical Programming techniques were applied to 
solve structural optimization problems, beginning with Schmit’s work [3]. In the early 1990s, 
Genetic Algorithm (GA) was firstly used to optimize truss structures [4]. GA is a population- 
based metaheuristic algorithm developed by John Holland in 1975. In comparison with classical 
optimization approaches, metaheuristics have some advantages such as easily escaping from 
local optima, handling discrete variables. During the last four decades, many metaheuristic 
algorithms have been developed and successfully applied to solve structural optimization, for 
example, Genetic Algorithm (GA) [4], Evolution Strategy [5], Differential Evolution (DE) [6], 
Particle Swarm Optimization (PSO) [7], Ant Colony Optimization (ACO) [8], etc. 


Despite their advantages as mentioned above, metaheuristic algorithms often require a large 
number of fitness evaluations to obtain a good solution. In structural optimization, the evaluation 
of the fitness function is often computationally expensive due to conducting time-consuming 
finite element analyses. In the following cases, one potential solution is to construct an 
approximate model that can quickly evaluate fitness function [9-14]. Besides, there is another 
approach that is to employ machine learning (ML) classifiers to identify which individuals 
should be evaluated using the original fitness evaluation. Rosso et al. [15] proposed a hybrid 
method called SVM-PSO for structural optimization in which a Support Vector Machine (SVM) 
model is integrated into the PSO process with the aim of separating feasible and infeasible 
solutions, thereby reducing the search space. The proposed method is applied to solve two 
numerical examples of a simply supported beam and a Warren truss beam. Recently, Nguyen and 
Vu [16] employed Artificial Neural Network (ANN) to classify the safety state of a structural 
solution. The ANN classification model is used in conjunction with the objective function 
comparison for eliminating worse individuals during the DE optimization. An example of a 47- 
bar planar tower is carried out and the results show that the proposed method saves about 20% of 
fitness evaluations. 


It is obvious that there exist many different ML classification algorithms and each algorithm will 
achieve its own percentage of rejection. Although there are numerous comparative studies on the 
performances of ML classifiers in structural problems [17—19], there is still a lack of a thorough 
evaluation of ML classifiers when embedding them into the optimization process. This is the aim 
of this study. 


For this purpose, six commonly used ML classifiers including ANN, SVM, k-Nearest Neighbor 
(kNN), Decision Tree (DT), Random Forest (RF), and Adaptive Boosting (AdaBoost) are 
carefully chosen for the comparison. The weight optimization of a double-layer grid structure of 
200 members is used as a numerical experiment. The comparison is made in terms of the optimal 
weight of the structure, the percentage in eliminating unnecessary fitness evaluations as well as 
the computing time. 
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The remainder of this paper is organized as follows. The paper first describes the technique for 
reducing fitness evaluations using ML classifiers. Six considered ML classifiers are briefly 
introduced in Section 3. In the following section, the comparison is conducted and the findings 
are discussed. Section 5 concludes this paper. 


2. Reduction of fitness evaluations using machine learning classifiers 


2.1. Statement of a structural optimization problem 


Structural optimization problems can be formulated as a constrained optimization problem 
(COP) where the objective function can be the weight or the cost of the whole structure while the 
constraints are often specified in design standards like stress conditions, buckling conditions, 
displacement conditions. The formulation of a structural optimization problem is expressed as 
follows: 


Find x={x,i=1,2,..57} 
to minimize: f (x) = Ş plx, (1) 
i=l 


g, (x) <0, 7 =1,2,...,.m 


subject to: 
<x, <u, 


where: x denotes the design variable vector; x; is the ith design variable; n is the number of 
design variables; f(x) is the objective function; g(x) is the jth constraint; m is the number of 
constraints; /;, u; are lower and upper limits of the variable x;, respectively. 


Metaheuristic algorithms are designed for solving unconstrained optimization problems. To 
handle COPs, a widely-used technique called the penalty method is applied in this study as 
follows: 


F (x) =(1+e,-cv)” x f (x) (2) 


where: F(x) is called the penalty function; cv denotes the degree of constraint violation; ¢, and & 
are two parameters that are carefully selected to ensure a good balance between the exploration 
and the exploitation during the optimization process. In this study, € is set 1 and & is linearly 
increased from 20 to 40 based on the recommendation of Ref. [20]. The degree of constraint 
violation cv is determined by taking a sum of all violated constraints as follows: 


cv = max (0, g ,(x)) (3) 


j=l 


The variables can be the cross-sectional areas of structural members for sizing optimization, or 
the nodal coordinates for shape optimization, or the material distribution for topology 
optimization. This study is limited to sizing optimization. 
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2.2. Technique to reduce fitness evaluations using machine learning classifiers 


The application of machine learning classifiers to eliminate worse individuals during the 
optimization process is not new. This technique has been proposed in Ref. [16] and successfully 
applied to solve the problem of the 47-bar planar tower. The idea behind this technique is based 
on the observation that during the DE optimization, many trial individuals are worse than their 
parent individuals and will lose the pairwise tournament at the selection step. A classification 
model with the ability to detect whether an individual is feasible or not could save many useless 
fitness evaluations. The present study uses the same technique with few enhancements to 
improve efficiency. 


In particular, the optimization process is split into two stages. At stage I, after initializing a 
population Po={x;o|=1,...,.VP}, three operators of the original DE algorithm (mutation, 
crossover, and selection) are sequentially carried out. At the certain iteration g, the current 
population is P,={x; |i=1,...,VP}. The mutation operator creates the mutant individual v; while 
the crossover produces the trial individual u; by taking some components of V;g and the rest 
from X;g. Next, the selection operator chooses the better one among uj, and X;g to enter the next 
iteration. Implementing iterative three operators aims to move the population towards the 
optimum. The DE is a well-established metaheuristic algorithm that has been introduced in many 
previous documents. To avoid wordiness, the formulations of the DE algorithm are not presented 
in this paper. Readers can refer to Ref. [6,14] for more details. 


It is noted that each metaheuristic algorithm always comprises the exploration and the 
exploitation tasks, in which the exploration is the process of expanding bounding regions to find 
new solutions while the exploitation is the process of improving current solutions. The 
implementation of the original DE in the first stage ensures the exploration and collects data for 
training machine learning models. Accordingly, each newly produced individual is evaluated by 
the original fitness function and it is either labeled “+1” if it is feasible or “—1” otherwise: 


Vig =H if cv(x,,)=0 

. . (4) 
Vig =—1 if cv(x,,)>0 
All newly evaluated individuals (X;g, Yig) are saved into the database. After several iterations, a 
training dataset is collected from historical fitness evaluations and an ML-based classification 
model is trained with the aim of classifying the feasibility of individuals. 


In stage II, the label of a trial individual u; produced by two operators mutation and crossover is 
predicted using the classification model just trained at the end of the previous stage. Three 
possible situations can occur as follows: (i) if the predicted label y;prei™t1, the trial individual 
Ujg is then evaluated using the original fitness function; (ii) if the predicted label y;prei™—1 and 
the objective function of the trial individual f(u;,) is smaller than that of the target individual 
KXig), it is also evaluated using the original fitness function; (iii) if the predicted label y; prea =—1 
and f(uj)>f{Xi.g), it is eliminated. Among three situations, there is one case that does not require 
conducting the original fitness evaluation. As a result, the computational cost is significantly 
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reduced. This technique is called the Classification-Assisted Differential Evolution (CADE). 
Both flowcharts of the CADE and the DE are schematized in Fig. 1 to clearly illustrate the 
proposed technique. 
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Fig. 1. Flowcharts: (a) Differential Evolution; (b) Classification-Assisted Differential Evolution. 


3. Review of machine learning classifiers 


Obviously, any ML classifier can be used in the CADE. In this work, six most commonly used 
classifiers that are selected for the comparison include Artificial Neural Network, Support Vector 
Machine, k-Nearest Neighbor, Decision Tree, Random Forest, and Adaptive Boosting. In the 
following subsections, these classifiers are briefly introduced. 


3.1. Artificial neural network (ANN) 


The ANN model comprises many nodes which are arranged into three types of layer: input layer, 
hidden layers, and output layer (Fig. 2). In this model, the weighted sum of outputs from nodes 
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of the previous layer becomes the input for nodes of the current layer. At each node, the input is 
transformed into the output by the activation function like the sigmoid function, the tanh 
function, the softmax function, or the ReLU function. These nonlinear functions will allow the 
model to create a complex mapping between the inputs and the outputs of the data. The model is 
trained using a technique called the back-propagation algorithm for minimizing the error 
between the predictions and ground truth values. ANN models can well handle both regression 
and classification tasks. For classification tasks, the cross-entropy is frequently used to measure 
the error. 


Input Hidden Hidden Output 
layer layer 1 layer 2 layer 
Fig. 2. Artificial Neural Network. 


3.2. Support vector machine (SVM) 


The goal of this algorithm is to find an optimal hyperplane in multi-dimensional space that can 
separate data points into distinct regions. Fig. 3 illustrates the SVM model for a binary 
classification problem with two features. There are, in fact, many hyperplanes that could be 
detected. The optimal hyperplane is the one having the largest margin. 


small margin 


support 
vectors 


Fig. 3. Support Vector Machine. 
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3.3. k-Nearest neighbor (KNN) 


The KNN is categorized as a “lazy learner” algorithm in which all data points are stored in the 
database without conducting any training process. When a prediction is requested, the algorithm 
will search k points that are closest to the considered point, and then the prediction is determined 
by taking a plurality vote of newly detected neighbors. In addition, a useful technique can be 
applied to improve the accuracy where distance-based weights are assigned to neighbors. This 
technique ensures that nearer neighbors will contribute more to the prediction than that of far 
neighbors. Euclidean, Manhattan, and Minkowski distances are three distance functions that are 
commonly used in the KNN. Fig. 4 displayed a simply KNN model with two cases of k=3 and 
k=7. 


Class B New point 
to classify 


Class A 


> 


Fig. 4. k-Nearest Neighbor. 


3.4. Decision tree (DT) 


The DT model is a tree-like graph that is frequently used to split data into smaller subsets. A 
simple DT model is visualized in Fig. 5, in which, each internal node denotes a test on an 
attribute, and each branch represents an outcome of the test and each leaf node holds a class 
label. The quality of a split can be measured using the information gain or the Gini impurity. 


No Yes 


No node 


Yes 


Leaf Leaf 
node node 


Fig. 5. Decision Tree. 


3.5. Random forest (RF) 


There is an effective way to enhance the classification accuracy by using many DT models 
simultaneously. Each DE model is trained by a random subset from the training data. The final 
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output of the prediction is obtained through the majority vote. This technique is called bagging 
and this is very efficient to improve the stability for unstable algorithms like DT or ANN. One of 
the most bagging-type methods is the RF where only a few features are randomly chosen to train 
the DT model. An illustration of the concept of the RF is presented in Fig. 6. 


( Training Dataset 
Pi 


Tree 2 Tree 3 
O 
$ 4 


Output 1 Output 2 Output 3 
e Vote 


Final Prediction 


Fig. 6. Random Forest. 


3.6. Adaptive boosting (AdaBoost) 


Boosting is also a method for reducing the variance and the bias of machine learning models. 
Like bagging, boosting combines several weak classifiers to create a strong classifier. The main 
difference between the two methods is that weak classifiers in bagging are built independently 
while weak models in boosting are trained sequentially. There exist many boosting algorithms, of 
which the AdaBoost is the first one. This algorithm is proposed by Schapire and Freund in 1997 
[21]. Fig. 7 illustrates the initial idea of the AdaBoost. 


Original Dataset Weighted Dataset 


Strong classifier 


Weak classifier #1 Weak classifier #2 
Fig. 7. Adaptive Boosting. 
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4. Comparison 


In this section, a numerical example of a real-size structure is carried out using both the original 
DE and the CADE. As mentioned above, the CADE can be used in conjunction with any 
classification algorithm. Six ML classifiers selected in Section 3 will be employed in turn and 
their efficiencies are compared thoroughly. 


4.1. Numerical example 


The example considered in this work is a 10x10m double-layer grid structure consisting of 200 
bars. The aim of this example is to find the cross-sectional areas of members for minimizing the 
weight of the structure. The formulation of this problem is expressed as follows: 


Find A=(A,A,K,A,, | 
n, nm(i) 
to minimize: W(A)= p>. A, X L, (5) 
i=l j=l 
Wen i (A) <0,k =1,2,...,nc 
A, ES 


in which: A; is the cross-sectional area of members of the ith group; ng is the number of groups; 
L; is the length of jth member; nm(i) is the number of members which belongs to the ith group; p 
is the density of the material used in this structure; g,(A) denotes the kth constraint; nc is the 
number of constraints; S represents the list of available profiles. 


In this problem, members are divided into ng=9 groups as presented in Fig. 8. All members are 
fabricated from the material steel having mechanical properties as follows: the density p=7850 
kg/m’; the modulus of elasticity E=2.1E+10 kg/m’; and the yield strength Fy=3.515E+07 kg/m’. 
This structure subjects to a load of g=250 kgf/m’ on the top surface. The list of available profiles 
S is described in Table 1. There are three types of constraints: tensile stress constraint, 
compressive stress constraint, slenderness ratio constraint, and displacement constraint. The 
tensile stress constraint is as follows: 


o,<¢F, 4, =09 (6) 


where: œ; is the tensile stress of the member; F; is the yield strength. 
The compressive stress constraint is as follows: 


o, SQF; 


Cc” cr? 


g. =0.85 (7) 


where: o is the compressive stress of the member; Fo is the critical stress of the corresponding 
member which is calculated based on the following formula: 
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(0.658% | if KL/r< 4.71, JE/F, 
Eg l (8) 


0.877F, if KL/r >4.71[EJF, 


where: F= 7 E/(KL/r)’; K is the effective length factor which equals 1.0 in this case; L is the 
length of the member; and r is the radius of gyration of the member’s cross-section 


The limitations of the slenderness ratio (KZ/r) are 200 for compression members while 300 for 
tension members. Finally, the allowable vertical displacement of this structure equals 7.5 cm. 


Sas SSS 4500 


a 10000 TEA 


Fig. 8. Configuration of the 10x10m double layer grid structure. 


Table 1 
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List of available profiles. 


No. Profile A(cm’) | r(cm) | No. Profile A(cm’) | r (cm) 
1 33.70x2.6 2.54 1.1 14 159.0x4.0 19.48 | 5.4814 
2 48.30x2.6 3.73 1.62 15 168.3x4.0 | 20.65 | 5.8102 
3 60.30x3.2 5.74 2.02 16 193.7x4.5 26.75 | 6.6922 
4 76.10x3.2 7.329 | 2.5799 | 17 219.1x5.0 33.63 | 7.5716 
5 82.50x3.2 7.972 2.806 18 244.5x5.4 | 40.56 | 8.4557 
6 88.90x3.2 8.616 | 3.0321 | 19 273.0x5.6 | 47.04 | 9.457 
7 101.6x3.6 11.08 | 3.4672 | 20 298.5x5.9 54.23 | 10.3471 
8 108.0x3.6 11.81 3.6934 | 21 323.9x5.9 58.94 | 11.245 
9 114.3x3.6 12.52 | 3.9161 | 22 355.6x6.3 69.13 | 12.3536 
10 127.0x4.0 15.45 | 4.3504 | 23 368.0x6.3 71.59 | 12.7895 
11 133.0x4.0 16.21 | 4.5629 | 24 406.4%6.3 79.19 | 14.1475 
12 139.7x4.0 17.05 | 4.8004 | 25 419.0x7.1 91.88 | 14.5645 
13 152.4x4.0 18.65 | 5.2483 | 26 457.2x7.1 100.4 | 15.915 


4.2. Experimental setup 


67 


This problem is categorized as a discrete optimization problem in which design variables are 
selected among a limited number of values. In this work, a simple technique is employed to 


handle discrete variables. The design variable 4;eS in Eq. (5) are replaced by an integer 
l;e {1,2,...,26} representing the sequence number of the considered profile in the list S. During 
the optimization process, a continuous value that is newly generated by the mutation and 


crossover operators is rounded to the nearest integer value. The cross-sectional area A; and the 
radius of gyration r; corresponding to the position J; are looked up from Table 1. Next, their 
values are used to evaluate the feasibility of the solution as well as save as the inputs of the 
training data. According to the evaluation result, this solution is labeled and the value is saved as 
the output of the training data. The data processing flow is illustrated in Fig. 9. 


l Ai} nm 
h m A2 
Ty Ag | r9 


x 


4 Feasible solution (cv=0) 


Infeasible solution (cv>0) > 


Inputs 


Output 


xXı=Å;/rı 


X2=Á/Tr2 


X9=Ad/Tro y < 


Fig. 9. Data processing flow. 
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Both the DE and the CADE use “DE/current-to-best/1” which is recognized as one of the most 
effective mutation strategies. For a fair comparison, two algorithms are set with the same 
parameters as follows: the number of individuals NP=25, the scale factor F=0.8, the crossover 
rate Cr=0.9, and the maximum iterations 2i¢,=100. In particular, the number of the first stage of 
the CADE is set to g:=20 which means the training dataset consists of 500 samples. Both the DE 
and the CADE algorithms are written in Python language. The finite element analysis code is 
developed based on the direct stiffness method. All ML models are constructed using the open- 
source library scikit-learn [22]. Optimal parameters of these models that were found after the 
hyperparameter tuning process are presented in Table 2. 


Table 2 
Summary of ML model parameters. 
Parameters 
activation='relu', solver='adam', batch_size=10, max_iter=1000, 
AA hidden_layer_sizes=(200,200,200,), shuffle=True, early_stopping=True 
SVM kernel='rbf', C=1000, gamma=0.1 
KNN n_neighbors=50 
DT criterion='gini', max_depth=None 
n_estimators=50, max_depth=None, max_features='auto', 
i max_leaf_nodes=None, min_impurity_decrease=1e-07, bootstrap=True 
AdaBoost DecisionTreeClassifier(max_depth=1), n_estimators=50 


The performance of machine learning models is evaluated through the accuracy metric. Due to 
the limited number of data samples, a widely-used technique called k-fold cross-validation is 
employed. In more detail, the training dataset is split into k subsets, in which the model is trained 
on (k-1) parts and then the trained model is validated on the remaining one. The process is 
repeadly carried out for k times. In this study, the parameter k is set to 5 and the mean values of 
the obtained accuracies are reported in Table 3. Next, the model is re-trained with the full 
training dataset and it will be embedded into the DE optimization. Particularly for the ANN 
model, the early stopping technique is used for preventing the overfitting phenomena. This 
feature is available in the library scikit-learn and it is enabled by the setting 
“early_stopping=True” when defining the model. 


Table 3 
Accuracies of ML models. 
ANN SVM KNN DT RF AdaBoost 
Accuracy 0.775 0.757 0.651 0.884 0.903 0.905 


It is clearly seen that two ensemble models including the RF and the AdaBoost reach high 
accuracy (over 90%), followed by the DT (88.4%). The ANN and the SVM achieve an accuracy 
of approximately 75% while the KNN has the worst performance with an accuracy of 65.1%. The 
observation is consistent with the findings of Ref. [19]. 


T.H. Nguyen, A.T. Vu/ Journal of Soft Computing in Civil Engineering 5-4 (2021) 57-73 69 


4.3. Results 


These are a total of seven cases that are carried for comparison. In general, the name of each case 
has the same form “A(B)” in which A represents the optimization algorithm and B denotes the 
ML classifier. For example, the case “CADE(AdaBoost)” means that the CADE is used to 
optimize the weight of the structure while the AdaBoost classifier is employed to reduce fitness 
evaluation. In particular, the name “DE” particularly denotes that the original DE is employed in 
this case. Each case is carried out 30 times. The best results in 30 independent runs are presented 
in Table 4. It is noted that all six cases of the CADE are performed, however, the best designs 
found by them are the same. Therefore, Table 4 presents one case of CADE. Additionally, the 
design found by Gholizadeh [23] using the commercial software SAP2000 is also reported to 
demonstrate the effectiveness of the proposed method. Fig. 10 shows the convergence curves of 
seven cases. The values indicated in Fig. 10 are averages of 30 runs. To compare the efficiency 
of six classifiers in reducing fitness evaluations, five metrics are used including the minimum 
weight (best), the average weight (mean), the standard deviation (SD), the average number of 
fitness evaluations (nFE), and the average computing time (time) of 30 independent runs. The 
obtained values of five metrics for six cases are reported in Table 5. 


Table 4 
Best results for the double-layer grid structure. 
Gholizadeh [23] DE CADE 
A 76.10x3.2 76.10x3.2 76.10x3.2 
Ay 101.6x3.6 101.6x3.6 101.6x3.6 
A3 33.70x2.6 33.70x2.6 33.70x2.6 
A4 101.6x3.6 101.6x3.6 101.6x3.6 
As 60.30x3.2 60.30x3.2 60.30x3.2 
As 33.70x2.6 33.70x2.6 33.70x2.6 
Ay 76.10x3.2 76.10x3.2 76.10x3.2 
Ag 48 .30x2.6 33.70x2.6 33.70x2.6 
Ay 33.70x2.6 48.30x2.6 48.30x2.6 
Weight (kg) 1702.73 1683.904 1683.904 
Table 5 
Evaluation metrics of six classifiers. 
best (kg) mean (kg) SD (kg) nFE time (s) 
CADE(ANN) 1683.904 1686.596 6.151 2095 80.6 
CADE(SVM) 1683.904 1719.392 155.850 1820 66.7 
CADE(KNN) 1683.904 1687.766 7.193 1818 45.4 
CADE(DT) 1683.904 1691.031 12.298 2030 43.7 
CADE(RF) 1683.904 1691.710 11.002 1819 54.5 
CADE(AdaBoost) 1683.904 1687.218 6.762 1743 48.9 
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Fig. 10. Convergence curves of seven cases. 


4.4. Comparison and discussion 


Based on the results described in Table 5, six CADE cases are ranked according to each metric. 
Next, their ranks are displayed on a radar chart as shown in Fig. 11. Some observations are as 
follows. 


AdaBoost 
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DT 


Mea i Time 


Fig. 11. Comparison of six ML classifiers in the CADE. 
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First of all, it can be seen that the weights of the grid structure found by the DE and the CADE 
(1683.904 kg) are smaller than that of Gholizadeh (1702.73 kg). It demonstrates the effectiveness 
of applying metaheuristic algorithms in structural optimization. 


Secondly, Fig. 10 indicates that the convergence speeds of all CADE cases are faster than the 
original DE. Specifically, the DE finishes after 2500 iterations while six CADE cases converge 
after about 1700 to 2100 iterations. Among six cases, the CADE(AdaBoost) is fastest with the 
average number of fitness evaluations of 1743, followed by the CADE(KNN) with 1818 
evaluations, the CADE(RF) with 1819 evaluations, the CADE(SVM) with 1820 evaluation, the 
CADE(DT) with 2030 evaluations, and the CADE(ANN) with 2095 evaluations. In comparison 
with the original DE, the reduction rates of six cases are 30.3%, 27.3%, 27.2%, 27.2%, and 
16.2%, respectively. 


However, there is an inconsistency when comparing the number of fitness evaluations and the 
computing time. Although the CADE(DT) requires an average of 2030 evaluations, the total time 
of the CADE(DT) is 43.7s, ranking first in 6 cases. Next is the CADE(kNN), the 
CADE(AdaBoost), the CADE(RF), the CADE(SVM), and the CADE(ANN). The reason is that 
the training speed of the DT model is very fast. It should be noted that for large-scale structures 
where the time for each finite element analysis is much larger than the training model time, the 
above ranking can be changed. 


Comparing in terms of the optimal weight, the CADE(ANN) is superior to others. In more detail, 
the mean value and the SD value of the CADE(ANN) are 1686.596 kg and 6.151 kg, 
respectively. The following cases are: the CADE(AdaBoost) with mean=1687.218 kg and 
SD=6.762 kg; the CADE(kKNN) with mean=1687.766 kg and SD=7.193 kg, the CADE(DT) with 
mean=1691.031 kg and SD=12.298 kg, the CADE(RF) with mean=1691.71 kg and SD=11.002 
kg, and the CADE(SVM) with mean=1719.392 kg and SD=155.85 kg. 


Overall, it is apparent that the application of ML classifiers into the optimization process 
significantly reduces the number of fitness evaluations. The reduction rate ranges from 16% to 
30%. The CADE(AdaBoost) achieves the best performance when balancing five evaluation 
metrics. 


5. Conclusions 


This paper presents a hybrid method called CADE with the aim of reducing the number of fitness 
evaluations of metaheuristic algorithms. In this method, an ML classifier is constructed based on 
historical fitness evaluations in order to predict the feasibility of newly produced individuals. In 
later generations, the ML classifier is employed as a filter to eliminate worse individuals, 
thereby, omitting many useless fitness evaluations. 


For comparing the efficiencies of ML classifiers, a 10x10m double-layer grid structure is 
optimized using six cases of the CADE with six different ML classifiers. The results show that 
applying ML classifier rejects approximately 16 to 30% of fitness evaluations. Among six 
compared ML classifiers, the AdaBoost gives the best performance. With lower computationally 
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cost, the AdaBoost classifier-assisted Differential Evolution is very effective when optimizing 
large-scale structures. 
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